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In this brief article I show how the notion of coarse graining and the Renormalization Group 
enter naturally in the dynamics of genetic systems, in particular in the presence of recombi- 
nation. I show how the latter induces a dynamics wherein coarse grained and fine grained 
degrees of freedom are naturally linked as a function of time leading to a hierarchical dy- 
namics that has a Feynman-diagrammatic representation. I show how this coarse grained 
formulation can be exploited to obtain new results. 

PACS: 05.10.Cc, 87.10.+e, 87.23.Kg, 89.75.-k 

1 Introduction 

The Renormalization Group (RG), in its many diverse guises, has proved to be an immensely 
powerful and useful tool in the treatment of systems with many degrees of freedom; with ap- 
plications covering a huge gamut, from relativistic quantum field theory to the asymptotics of 
differential equations. In this short article I discuss another arena where the RG appears in a very 
natural way - genetic dynamics. 

By genetic dynamics I mean the dynamics of string-, or tree-like objects whose evolution 
is governed by a set of genetic operators, the most common of which are selection, mutation 
and recombination. Selection and mutation have been extensively studied by physicists (see 
for example [1]). Recombination however remains relatively untouched, although it has been 
extensively studied in biology (see for example [2]). The chief areas of interest are population 
genetics, and associated fields, and evolutionary computation. In both fields of interest one may 
be dealing with many, many degrees of freedom and hence the normal RG motivation of reducing 
degrees of freedom is valid. For instance, a typical protein has O(10 4 ) aminoacids. 

Here, I will concentrate more on showing that recombination naturally induces a coarse 
graining and the subsequent dynamics possesses a hierarchical structure wherein genetic con- 
figurations are related in the past to more coarse grained genetic "building blocks". I show that 
such coarse grainings have a RG structure and lead to new results and insights that can be, and 
have been, profitably used - principally in evolutionary computation. Although the flavour of 
this article is quite different many of the results and more details can be found in the following 
articles [3-7]. 

'E-mail address: Stephens @nuclecu.unam.mx 



Institute of Physics, SAS, Bratislava, Slovakia 



1 



2 



C. R. Stephens 



2 Genetic Dynamics 



Consider the dynamics of a population A4(t) = {Ci(t)} C Q of strings of equal length 2 , iV, 
where Q is the configuration space of string states and {c'i(t)} is the set of "genotypes" present 
in the population at time t. For simplicity we will assume binary bits, though nothing we shall 
present depends on this fact. We may naturally represent Q as a TV-dimensional hypercube, 
a natural metric being the Hamming distance, strings associated with adjacent vertices being 
Hamming distance one apart. The N string loci form a complete orthonormal basis for the 
hypercube. 

The population evolves in discrete time under the action of an evolution operator H, the 
action of which depends on the specific "genetic" operators involved. Here, we will consider the 
three canonical operators - selection, mutation and recombination - whereupon we have 



In this case TL depends on the reproductive fitness landscape, {/,}, the population {A4(t)} and 
the set of parameters, {pk}, that govern the other genetic operators; e.g. mutation and recom- 
bination probabilities. For selection Pi(t + 1) = P((t) = FijPj(t), where Fij is the fitness 
matrix and Pi (t) is the probability of finding the string Cj at time t. One of the most widely used 
selection schemes is proportional selection where the dynamics is given by F t j — {fi/ f{t))Sij, 
where f(t) is the average population fitness. It is usually considered as a unary operator. 

Mutation is also a unary operator and, typically, is such that every string bit flips to its com- 
plement with probability p m every generation. Recombination, in distinction, is a binary operator 
(although higher cardinality can be considered) and is such that a "child" string is formed by tak- 
ing a certain number of bits from one "parent" string and the complement from another "parent" 
string. For example, one can form 1111 from parents 1010 and 0101 by taking the first and third 
bits from 1010 and the complementary second and fourth bits from 0101. One may specify the 
bits taken from the first parent using a recombination "mask", m. In the above example the re- 
combination mask is 1010 which signifies take bits one and three from the first parent (specified 
by the position of the ones) and two and four from the second. 

The resultant dynamical equation describing the evolution of the probability distribution for 
this system is 



where P % c (t) is the probability to find strings of type Cj after selection and crossover. 

The mutation matrix, V, has matrix elements V l j — p d m — p m ) N ~ dH where df 

is the Hamming distance between the two strings. For mutation Hamming distance is clearly a 
very natural metric. Note that (2) also applies for a finite population if we interpret the left hand 
side of (2) as the expected proportion of genotype c, to be found at t + 1 while any Pi (t) on the 
right hand side are to be considered as the actual proportions found at t. 

Explicitly Pl{t) is given by 



M{t + 1) = H({fi}, {M(t)}, {p k }, t)M(t) 



(1) 



P i (t + l)=V ij Pi(t) 



(2) 




(3) 



=i 



2 This is not a restriction. The extension to non-fixed length strings and trees has been considered by Poli and collab- 
orators in the context of Genetic Programming (see [8] for a recent exposition). 
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where Ayfc(m) is an interaction term between strings, that depends on the particular crossover 

mask m, and 2^ m=1 is the sum over all possible recombination masks, m G M, M being the 
space of masks. Generically, Ay-fc can be divided into two terms, one associated with string 
destruction, Xfj k , and the other, A?- fe , associated with string construction. Taking as target the 
string 111, an example of the former is 111 + 000 — > 110 + 001 while for the latter an example 
is the inverse of this process. To write these processes more explicitly we denote the set of bits 
inherited by an offspring from parent Cj as S and the bits inherited from parent c k , i.e. the set 
Cfc — S, by C. Naturally, S and C both depend on the particular crossover mask chosen. Then, 

>H jk = -Pc{m)8 lk {\ - 5ij)C^% (m) = - Pc (m)S ik (l - S^df (i,j))6(d?(i,j)) (4) 

and 

Kjk = Pc(m)(l - S tJ )(l - 5 lk )C^ Ck {m) = 
^(1 - % - S lk (i,j))5{df (i,k)) + 5(d? (i,j))S(d» (i,k))} (5) 

where p c (m) is the probability to implement the mask m and the coefficients C^ c .{m) and 

Cc Ck (m), represent the probabilities that, given that Cj was one of the parents, it is destroyed 
by the crossover process, and the probability that given that neither parent was c, it is created 
by recombination. d^(i,j) is the Hamming distance between the strings d and Cj measured 
only over the set S, with the other arguments in (4) and (5) being similarly defined. Note that 
Cc^ Cj {m) and c fe i m ) are properties of the crossover process itself and therefore population 
independent. It is clear that for recombination Hamming distance is not a natural metric. For 
example, consider two parent strings 1111111111 and 0000000000. A one-point crossover im- 
plemented between the last two bits leads to offspring 1111111110 and 0000000001 which are 
Hamming distance one from the respective parents. An equally probable crossover between the 
fifth and six bits however, leads to 1111100000 and 0000011111, which are Hamming distance 
five away from the parents. For a given i, \j k is a 2 Ar -dimensional matrix, but is very sparse, 
there being only 0(2^) non-zero elements. Thus, the microscopic representation is very ineffi- 
cient, there being very few ways of creating a given target by recombination of strings. The vast 
majority of string recombination events are neutral in that they lead to no non-trivial interaction. 

The equations (2) and (3) yield an exact expression for the probability distribution governing 
the evolution for arbitrary selection, mutation and crossover. It takes into account exactly the 
effects of destruction and construction of strings. 



3 Coarse-Grained Evolution Equations 

The dynamics of the previous section is described by 2 N coupled, non-linear difference equa- 
tions representing the microscopic degrees of freedom, i.e. the strings themselves. In the absence 
of recombination, the equations are essentially linear and, as is well known, the resulting selec- 
tion/mutation problem can be recast in the guise of a two-dimensional, inhomogeneous statistical 
mechanics problem, where powerful techniques such as the transfer matrix approach can be in- 
voked. However, save in very simple problems, such as a linear fitness landscape, even this 
simpler problem is formidable. Recombination adds an extra layer of complexity. Naturally, in 
such problems one always wishes to find the correct effective degrees of freedom so as to be able 
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to affect an effective reduction in the dimensionality of the problem. Such reductions can some- 
times come about in a relatively trivial fashion, for instance, if there is an underlying symmetry 
that is preserved by the action of the genetic operators. This occurs for instance with selection 
and the genotype-phenotype map. As fitness acts at the phenotypic level then a natural coarse 
graining from genotypes to phenotypes occurs. This symmetry is not necessarily preserved by 
the other genetic operators. As a concrete example consider a fitness landscape where the fitness 
is given by the number of ones on the string (a simple paramagnet). In this case the dynamics 
can be rewritten in terms of the N phenotypes rather than 2 N genotypes. The equation of motion 
for selection only is then 

P n (t+1) = ^-P n (t) (6) 
n(t) 

where we denote phenotypes by n, the number of ones, and n(t) is the average number of ones 
in the population at time t. The solution of these n difference equations is 

n'P m (0) 

" () Eto^(o) (7) 

Another example is that of the Eigen model, where the fitness landscape is degenerate for all 
genotypes except one, the master sequence. At the level of selection only, given that there are 
only two phenotypes, there is a reduction in the size of the configuration space from 2 N to 2, 
i.e. a reduction in the number of degrees of freedom from N to 1. However, if we include in 
the effect of mutation we see there is an induced breaking of the genotype-phenotype symmetry 
due to the fact that strings close to the master sequence in Hamming distance have a higher 
"effective" fitness [9]. 

As mentioned in the previous section, the string representation for recombination is very 
inefficient due to the sparsity of the interaction matrix. This is an indication that strings are not 
the natural effective degrees of freedom for recombination. So what are? To form the string 111 
with a recombination mask 100 one can join strings 111, 110, 101 and 100 with either 111 or 
011. In other words, for the first parent the second and third bit values are unimportant and for 
the second the first bit value is unimportant. Thus, it is natural to coarse grain over those strings 
that give rise to the desired target for a given mask. Such coarse-grained variables are known 
as "schemata", which we will denote by and are equivalent to, for instance, "block spins" in 
traditional statistical mechanics RG applications. The marginal probability, Pi(t), represents the 
probability of finding the schema £j at time t. A specific schema is determined by summing over 
those bit positions that are not part of the schema. One may denote such a bit position by a *. 
Thus, 11* represents the two strings 111 and 110. The number of definite bits of the schema 
defines its order, N 2 , while the distance between the outermost defining bits defines its length. 
Thus, *11 * *0 * * has N 2 — 3 and I = 5. 

As there exist 3 N possible schemata a full schemata basis is overcomplete as well as being 
non-orthonormal and corresponds to the space of all possible blocked spins. However, the space 
of schemata is not the natural one for recombination as we shall see. If one picks arbitrarily 
a vertex in Q, associated with a string c\, one may perform a linear coordinate transformation 
A : Q — > Q to a basis consisting of all schemata that contain Cj. For instance, for two bits 
Q = {11, 10, 01, 00}, while Q = {11, 1*, *1, **}. The invertible matrix A is such that A„ = 1 
Cj G We denote the associated coordinate basis the Building Block basis (BBB). The 
BBB is complete but clearly not orthonormal. Note that the vertex d by construction is a fixed 
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point of this transformation. Apart from the vertex c,, note that points in Q correspond to higher 
dimensional objects in Q. For instance, 1* and *1 are one-planes in Q while ** is the whole 
space. In the BBB one finds 



where Ayfe(m) = Ajj/Aj'j'fc'A^, A fefe , . Ayfc(m) has the property that for a given mask only 

interactions between BBs that construct the target schema are non-zero. i.e. A^fc (to) = 0, unless 
k corresponds to a schema which is the complement of j with respect to i. For example, for two 
bits, if we choose as vertex 11, then 11 may interact only with **, while 1* may interact only 
with *1. In Q this has the interesting interpretation that for a target schema £ of dimensionality 
{N — d) only geometric objects "dual" in the d-dimensional subspace of Q that corresponds to 
£ may interact, i.e. a fc-dimensional object recombines only with a (N — d — k) -dimensional 
object. Additionally, a (N — d) -dimensional object may only be formed by the interaction of 
higher dimensional objects. In this sense interaction is via the geometric intersection of higher 
dimensional objects. For example, the point 1 1 can be formed by the intersection of the two lines 
1* and *1. Similarly, 1111 can be formed via intersection of the three-plane 1 * ** with the line 
* 1 1 1 or via the intersection of the two two-planes 11 * * and * * 11. 

Given that the object dual to a vertex is always the trivial schema, *...*, where all bits are 
coarse grained, and P(*...*, t) = 1, then it is instructive to combine the term linear in P( (t) with 
it's pure selection counterpart. One obtains for an arbitrary string d 



where p c = J2 m P c ( m ) an ^ we nave rem rned to a less abstract notation. P'(cf(m),t) is the 
probability to select the BB cf(m) and P'(cf(m), t) the probability to select the BB cf (to). 
Both cf (to) and C?(to) are elements of the BBB. The above equation clearly shows that re- 
combination is most naturally considered in terms of the BBB. The (2 N — 1) destruction terms 
associated with \fj k have been reduced to only one term while the (2 N — l) 2 construction terms 
have also been reduced to one term. Of course, we must remember that the coarse grained av- 
erages of C?(to) and cf (to) contain 2 N terms, still, the reduction in complication is enormous. 
Thus, we see that recombination as an operator naturally introduces the idea of a coarse graining. 

Inserting (9) in (2) we can then try to solve for the dynamics. However, in order to do that we 
must know the time dependence of the BB schemata cf (to) and cf (m). Although the number 
of BB basis elements is 2^ we may generalize and consider the evolution of an arbitrary schema, 
£. To do this we need to sum with X^co? on ^ otn s ^ es °^ tne ec l uat i on (2)- This can simply 
be done to obtain [3-5] again the form (2), where this time the index i runs only over the 2 N2 

elements of the schema partition and where again V%j = pL (1 — p m ) N ~ dH ( t,: >\ In this 
case however df^ is the Hamming distance between the two schemata. For instance, for three bit 
strings the schemata partition associated with the first and third bits is {1*1, 1*0, 0*1, 0*0}. 
In this case d^ 2 — 1 and d^ 4 = 2. P c (£, t) = X)c o£ ^c(Ci, i) is the probability of finding the 
schema £ after selection and crossover. Note the form invariance of the equation after coarse 
graining. To complete the transformation to schema dynamics we need the schema analog of (9). 




(8) 



= 1 




(9) 
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This also can be obtained by acting with X)c o£ on ^ otn s ^ es °^ ^ e e q uat i° n - One obtains 

P c (Z,t) = (l- Pc ^M)P>(Z,t)+ Vpc(m)P'(f (m),t)P'(e(m),t) (10) 
" M 

me M r 

where £ s (to) represents the part of the schema £ inherited from the first parent and £ c (ra) that 
part inherited from the second. N Mr (£) is the number of crossover masks that affect £, M r being 
the set of such masks. N M is the total number of masks with p c (m) ^ 0. Obviously, these 
quantities depend on the type of crossover implemented and on properties of the schema such as 
defining length. 

Thus, we see that the evolution equation for schemata is form invariant there being only a 
simple multiplicative renormalization of the crossover probability p c . This form invariance, first 
shown in [3], demonstrates that BB schemata in general are a preferred set of coarse grained 
variables and more particularly the BBB is a preferred basis in the presence of recombination. It 
has also been shown [10] that schemata, more generally, are the only coarse graining that leads 
to invariance in the presence of mutation and recombination. 

Considering again the structure of (9) and (10) we see that variables associated with a certain 
degree of coarse graining are related to BB "precursors" at an earlier time, which in their turn ... 
etc. This hierarchical structure terminates at order one BBs as these are unaffected by crossover. 
Thus, for example, the level one BB combinations of 111, i.e. BBs that lead directly upon 
recombination to 111 are: 11* : * * 1, 1 * 1 : *1* and 1 * * : *11. The level two BBs are 1 * *, 
*1* and * * 1. Thus a typical construction process is that BBs 1 * * and *1* recombine at t = t\ 
to form the BB 11* which at some later time ti recombines with the BB * * 1 to form the string 
111. 

4 Renormalization Group 

In the previous section we saw that coarse grained variables arise very naturally in genetic dy- 
namics and gave as examples the genotype-phenotype map and schemata. We can formalize 
these considerations by formally introducing a general coarse graining operator TZ(r], rj') which 
coarse grains from the variable i] to the variable rf . In this case 

11(7], r/)P( V , t) = P(v', t) K^, 7]")P(7], t) = P(t]", t) (11) 

However, given that 1Z(r]', T]")P(r]', t) — P(r]" , t) we deduce that 

TZ( V ,T]")=n(f],r,')n(f]', V ") (12) 

i.e. the space of coarse grainings has a semi-group structure. Thus, we see that one can natu- 
rally introduce the RG into the study of genetic dynamics. The naturalness of a particular RG 
transformation will be to a large extent determined by how the dynamics looks under this coarse 
graining. Considering (1) for the pdf of the dynamics then given that 1Z(r], t]')P(t], t) = P(r]', t) 
the dynamics under a coarse graining is governed by 1Z(r], T]')H v P(r], t), where H v is the dy- 
namical operator associated with the variables i]. If this can be written in the form H v >P(r]', t) 
with suitable renormalizations then the dynamics is form covariant or invariant under this coarse 
graining. As we have seen, for selection only the dynamics is invariant 3 when passing from geno- 
typic to phenotypic variables, while for schemata the whole dynamics is form invariant, although 

3 In this case it is strictly invariant not just form invariant as there is no renormalization necessary of any parameter or 
variable. 
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there is a non-trivial renormalization of the fitness landscape as well as a simple renormalization 
of the recombination probability. In the case of recombination note also that the coarse graining 
operator associated with the BBs satisfies 

n( v ,r } ') = n( v s ,r ] ' s )ii(v c ,v ,c ) (13) 

where TZ(f] s ,T]' s ) represents the action of the coarse graining on the BB S while lZ(rj c , r)' c ) 
represents the action on the BB C. 

5 Results from the coarse grained formalism 

One of the strengths of the present coarse grained formulation is that much can be deduced simply 
by inspection of the hierarchical structure of the basic formulas. Introducing a 2 N -dimensional 
population vector, P(i), whose elements are P(Cj, t), i = 1, 2 N , equation (2) can then be 
written in the form 

P(t + 1) = W.(t)P(t) + J2 Pc(m)Wj(m, t) (14) 

m— 1 

where the selection-crossover destruction-mutation matrix W s (t) = VF(t). The selection - 
crossover destruction matrix, F(t), is diagonal, and takes into account selection and the destruc- 
tive component of crossover. Explicitly, for proportional selection Fu(t) = (/(Cj)//(t))(l — 
p c ). Finally, the "source" matrix is given by j(m, t) = P'(cf(m),t)P'(cf(m),t). The interpre- 
tation of this equation is that j (to, t) is a source which creates strings (or schemata) by bringing 
BBs together, while the first term on the right hand side tells us how the strings themselves are 
propagated into the next generation, the destructive effect of crossover renormalizing the fitness 
of the strings. 

As shown in [3-5], compared to a representation based on (3), even a formal solution of 
(9) in the absence of mutation and for 1 -point crossover and proportional selection yields much 
valuable qualitative information, such as a simple proof of Geiringer's theorem and an extension 
of it to the weak selection regime. Explicitly, we have 

t-1 2 N t-1 t-1 

P(*) = II W s (n)P(0)+ 5>cM]T n W s (z)Pj(TO,n) (15) 

n=0 m=\ n=0i=n+l 

Due to the form invariance of the equations this solution actually holds true for arbitrary 
schemata. The only changes are that the vectors are of dimension 2 N2 , the matrices of dimension 
2 N2 x 2 N2 , the sum over masks for the construction terms is only over the set M r and that the 
BBs in j(m, t) are those of the schema rather than the entire string. 

The interpretation of (15) follows naturally from that of (14). Considering first the case with- 
out mutation, the first term on the right hand side gives us the probability for propagating a string 
or schema from t = to t without being destroyed by crossover. In other words nl=o ^ s ( n ) ^ s 
the Greens function or propagator for P. In the second term, j (to, n), each element is associated 
with the creation of a string or schema at time n via the juxtaposition of two BBs associated 
with a mask m. The factor Y[l= n W a (i) is the probability to propagate the resultant string or 
schema without crossover destruction from its creation at time n to t. The sum over masks and 
n is simply the sum over all possible creation events in the dynamics. 
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This formulation lends itself very naturally to a diagrammatic representation and the for- 
mulation of a set of "Feynman rules" which allow for the calculation of P(£,t) in the BBB. 
Here for simplicity and transparency we write them for p m = 0. The generalization to "matrix" 
propagators in the presence of mutation is straightforward. 

1) Draw all possible connected tree diagrams that contribute to £ 

2) For each diagram to each internal line attach a propagator 

F tJ (t,t) - (1 - Pc -^— ) 

3) To each vertex assign a weight 



Xijk = Pc(m) — 5(d k +di-(N- dj)) 

4) Carry out the integration over time for all vertices 

In the above di represents the dimensionality of the schema i. As a simple example con- 
sider two bit strings and p c = 1. Consider the pdf for 11. In this case there are only two 
diagrams; the diagram corresponding to propagation of 11 itself from t — to t and the forma- 
tion of 11 by recombination of 1* and *1 at time t', < t' < t. In this case Fn(t, 0) = 0, 
0) = Ill=o (/(-*-*> n )/f( n )) an d similarly for *1. For the interactions the only non-zero 
vertex is A lljltjtl = (/(l*,n)//(n))(/(*l,n)//(n)). The diagrammatic series is naturally 
a perturbation series in the number of BB recombination events. In the case of a flat fitness 
landscape the entire diagrammatic series can be exactly resummed for an arbitrary string in the 
continuous time limit to find [6] 

N-l 

P( Ci ,t)= e-^(l-e"^T) A, -"- 1 7'(n+l) (16) 

71=0 

where V(n + 1) is an initial condition and represents a partition over the probabilities for finding 
(N — n) building blocks at t = 0. For a given n there are N ~ 1 C n such terms. For instance, 
for 11, V{2) = P(ll, 0) and V(l) = P(U, 0)P(*1, 0). Note that the simple dynamical form 
arises because of the use of the BBB. One can use A to rewrite the above result in a string basis. 
The resulting expression is far more complicated in that the dynamical factors for a given string 
combination are complicated combinations of those associated with the BBB. Note also that the 
fixed point is non-perturbative in p c indicating that the asymptotic dynamics cannot be accessed 
perturbatively. This is why it was necessary to sum the entire diagrammatic series. 

The tendency of recombination is to destroy correlations between different loci in the popula- 
tion. Selection, depending on the landscape, can induce corrlations, hence there is a competition. 
In the case of weak selection and strong crossover (16) shows that correlations asymptotically 
decay and hence the effective degrees of freedom are 1-schemata. Higher order schemata can 
be taken as perturbations around this decorrelated limit. In the case of a recombination oper- 
ator that mixes freely all bits within the entire population (genepool recombination) then these 
perturbations are zero and the 1-schemata give an exact description of the dynamics. Under 
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these circumstances one may solve the dynamics exactly, including mutation, for certain fitness 
landscapes, such as a linear fitness landscape [7]. 

6 Conclusions 

We have here briefly tried to lay out why RG concepts and techniques can be useful when study- 
ing the dynamics of genetic systems. I showed that genetic dynamics can be profitable studied in 
the context of coarse grained degrees of freedom. What is a natural coarse graining was seen to 
depend on the genetic operators present. We saw that for recombination the equations of motion 
were form invariant under a schemata coarse graining and that the BBB was a preferred one lead- 
ing to a recombination dynamics where effective degrees of freedom are related to more coarse 
grained BBs. We showed that the hierarchical nature of recombination led to a natural formu- 
lation in terms of Feynman diagrams with an associated set of Feynman rules. We also briefly 
mentioned some concrete results that emerge naturally from a coarse grained formulation. We 
strongly believe that the RG has an important role to play in developing a more quantitative un- 
derstanding of the dynamics of genetic systems and hope there will be interesting devlopments 
to report at the next RG conference. 
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